Standards in Genomic Sciences (2014) 9:514-62 3 



DOI:10.4056/sigs.4778605 



Draft genome sequence of Cluconobacter thailandicus 
NBRC 3257 

Minenosuke Matsutani', Haruo Suzuki^, Toshiharu Yakushi\ Kazunobu Matsushita^ 

^Department of Biological Chemistry, Faculty of Agriculture, Yamaguchi University, 
Yamaguchi 753-8515, Japan 

^Department of Environmental Science and Engineering, Graduate School of Science and 
Engineering, Yamaguchi University, 1677-1 Yoshida, Yamaguchi, Japan 

Correspondence: Kazunobu Matsushita (kazunobu@yamaguchi-u.ac.jp) and Haruo Suzuki 
(haruo@g-language.org) 

Keywords: Acetic acid bacteria, Cluconobacter 



Cluconobacter thailandicus strain NBRC 3257, isolated from downy cherry {Prunus 
tomentosa), is a strict aerobic rod-shaped Gram-negative bacterium. Here, we report the fea- 
tures of this organism, together with the draft genome sequence and annotation. The draft ge- 
nome sequence is composed of 107 contigs for 3,446,045 bp with 56.17% G+C content and 
contains 3,350 protein-coding genes and 54 RNA genes. 



Abbreviations: DDBJ- DNA Data Bank of Japan, EMBL- European Molecular Biolc^y Labora- 
tory, NCBI- National Center for Biotechnology Information, AAB- Acetic Acid Bacteria, NJ- 
Neighbor joining, PQQ- Pyrroloqui noline quinone 



Introduction 

Acetic acid bacteria (AAB) are strictly aerobic 
Alphaproteobacteria. AAB are well known for their 
potential to incompletely oxidize a wide variety of 
sugars and alcohols. The genus Cluconobacter oxi- 
dizes a wide range of sugars, sugar alcohols, and 
sugar acids, and can accumulate a large amount of 
the corresponding oxidized products in the cul- 
ture medium [1]. Thus, Cluconobacter strains are 
widely used for the industrial production of 
pharmaceutical intermediates, such as L-sorbose 
(vitamin C synthesis), 6-amino-L-sorbose (synthe- 
sis of the antidiabetic drug miglitol), and 
dihydroxyacetone (cosmetics for sunless tanning) 
[1]. Furthermore, the genera Acetobacter and 
Cluconacetobacter are widely used for the indus- 
trial production of vinegar because of their high 
ethanol oxidation ability [2]. 

To date, six genome sequences of Cluconobacter 
strains [Cluconobacter oxydans 621H, 
Cluconobacter oxydans H24, Cluconobacter 
oxydans WSH-003, Cluconobacter thailandicus 
NBRC 3255, Cluconobacter frateurii NBRC 101659, 
and Cluconobacter frateurii NBRC 103465) are 
available in the public databases [3-8]. These ge- 
nomic data are useful for the experimental 



identification of unique proteins or estimation of 
the phylogenetic relationship among the related 
strains [9-11]. 

Cluconobacter thailandicus NBRC 3257 was isolat- 
ed from downy cherry [Prunus tomentosa] in Ja- 
pan [12], and identified based on its 16S rRNA se- 
quence [13]. Here, we present a summary of the 
classification and a set of features of C. 
thailandicus NBRC 3257, together with a descrip- 
tion of the draft genome sequencing and annota- 
tion. 

Classification and features 

A representative genomic 16S rRNA sequence of 
C. thailandicus NBRC 3257 was compared to the 
16S rRNA sequences of all known Cluconobacter 
species type strains. The 16S rRNA gene sequence 
identities between C. thailandicus NBRC 3257 and 
all other type strains of genus Cluconobacter spe- 
cies were 97.58-99.85%. Cluconobacter species 
(type strains) exhibiting the highest sequence 
identities to NBRC 3257 were Cluconobacter 
frateurii NBRC 3264T and Cluconobacter japonicas 
NBRC 3271T. Figure 1 shows the phylogenetic re- 
lationships of C. thailandicus NBRC 3257 to other 
The Genomic Standards Consortium 
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Gluconobacter species in a 16S rRNA based tree. 
All the type strains and ten strains of G. 
thailandicus including NBRC 3257 were used for 
the analysis [13,17]. Based on this tree, genus 
Gluconobacter is divided into two sub-groups. 
Gluconobacter wanchamiae, Gluconobacter cerinus, 
G. frateurii, G. japonicas, Gluconobacter nephelli, 
and G. thailandicus are classified as clade 1. On the 
other hand, Gluconobacter kondonii, Gluconobacter 
sphaericus, Gluconobacter albidus, Gluconobacter 
kanchanaburiensis, Gluconobacter uchimurae, 
Gluconobacter roseus, and Gluconobacter oxydans 
belong to the clade 2. All ten G. thailandicus strains 
are closely related to each other, and the 16S 
rRNA sequences have 100% identities. 



Although ethanol oxidation ability is a typical fea- 
ture of AAB, it is a critical feature that NBRC 3257 
lacks the ability to oxidize ethanol because it is 
missing the cytochrome subunit of the alcohol de- 
hydrogenase complex that functions as the prima- 
ry dehydrogenase in the ethanol oxidase respira- 
tory chain [18]. Despite its inability to oxidize eth- 
anol, NBRC 3257 can efficiently oxidize many 
unique sugars and sugar alcohols, such as 
pentitols, D-sorbitol, D-mannitol, glycerol, meso- 
erythritol, and 2,3-butanediol [19]. Thus, G. 
thailandicus NBRC 3257 has unique characteristic 
features and the potential for the industrial pro- 
duction of many different oxidized products useful 
as drug intermediates or commodity chemicals. 
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Gluconobacter thailandicus NBRC100600' ( AB128050) 
Gluconobacter thailandicus NBR('I(M)60I ( AK6XI2()7) 
Gluconobacter thailandicus NBRC3291 (AK178421) 
Gluconobacter thailandicus NBRC3289 ( AB178419t 
Gluconobacter thailandicus NBRC 325« (AB178399I 

Gluconobacter thailandicus NBRC3257 (AB178398) 
Gluconobacter thailandicus NBRC3256 ( ABI78397) 
Gluconobacter thailandicus NBRC3255 (AB178396I 
Gluconobacter thailandicus NBRC3254 ( AB178395) 
Gluconobacter thailandicus NBRC3172 (AB178388) 
^—Gluconobacter nephelii NBRC 106061 ■ I AB54«148> 
Gluconobacter frateurii NBRC3264T (ABI78403) 
Gluconobacter japonicus NBRC327r (AB1784I0) 

— Gluconobacter cerinus NBRC 3267' (AB171M06) 



Gluconobacter wancherniae NBRC10358r (AB51I060) 

mGluconohacler oxydans NBRC14819i I ABI78433) 
» Gluconobacter roseus NBRf3990' (ABI78429) 

—Gluconobacter uchimurae NBRC100627T (AB193244) 

Gluconobacter kanchanaburiensis NBRC 103587' (AB459530> 
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r-Gluconobacter sphaericus NBRCI24671 (AB178431) 
iGluconobacler albidus NBRC3250' (ABI78392> 
^^iluconobacter kondonii NBRC3266I (AB17840S) 

Acetobacter aceli NBR(14«18T (X74066) 



Figure 1. Phylogenetic tree highlighting the phyiogenetic position of Gluconobacter thailandicus NBRC 3257 rela- 
tive to other type strains within the Gluconobacter. To construct the phylogenetic tree, these sequences were col- 
lected and nucleotide sequence alignment was carried out using CLUSTALW [14]. We used the MEGA version 
5.05 package to generate phylogenetic trees based on 16S rRNA genes with the neighbor-joining (NJ) approach 
and 1,000 bootstrap replicates [15,16]. /\cefobacter acet/ NBRC1 481 8 (X74066) was used as the outgroup. 
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Cluconobacter thailandicus NBRC 3257 



G. thanilandicus NBRC 3257 is a strictly aerobic, 
mesophilic [temperature optimum ~ 30 ° C) organ- 
ism. Differential interference contrast image of G. 
thailandicus NBRC 3257 cells grown on mannitol 
medium (25 g of D-manntiol, 5 g of yeast extract, 
and 3 g of poly peptone per liter) are shown in Fig- 
ure 2 (A). The cells have short- rod shape with 2.6 ± 
0.6 [mean ± SD, n = 10) |im in cell length and 1.2 ± 
0.1 [mean ± SD, n = 10) |im in cell width. Theflagel- 
la stained by the modified Ffyu method are shown 
in Figure 2 [B) and Figure 2 [Q [20]. Singly and 
multiply flagellated cells were observed frequently. 
The characteristic features are shown in Table 1. 



Table 1. Classification and general features of Cluconobacter thailandicus NBRC 3257 according to the 
MIGS recommendations [21] 



141/— c ir\ 


Property 


Term 


Evidence code^ 






Domain Bacteria 


TAS [22] 






Phylum Proteobacteria 


IAS [2 3] 






All J. 1 X ■ 

Class Alphaproteobacteria 


-J- A C T'^ A 'I 1 — 1 

TAS [24,25] 




Current classification 


Order Rhodosp irillales 


X A c n r 1 "71 

1 Ab /] 






Family Acetobacteraceae 


TAS [28,29] 






Genus Cluconobacter 


TAS [2 7,30,31] 






Species Cluconobacter thailandicus 


TAS [17,32] 






Strain NBRC 3257 


TAS [23] 




Gram stain 


Negative 


IDA 




Cell shape 


Rod-shaped 


IDA 




Motility 


Motile 


NAS 




Sporulation 


Not report 


NAS 




Temperature range 


Mesophilic 


IDA 




Optimum temperature 


30°C 


IDA 




Carbon source 


Glucose and/or glycerol 


IDA 




Energy source 


Glucose and/or glycerol 


NAS 


MIGS-6 


Habitat 


Free living 


NAS 


MIGS-6.3 


Salinity 


Not report 


NAS 


MIGS-22 


Oxygen 


Strict aerobes 


IDA 


MIGS-15 


Biotic relationship 


Fruits and Flower 


IDA 


MIGS-14 


Pathogenicity 


Non-pathogenic 


IDA 


MIGS-4 


Geographic location 


Japan 


NAS 


MIGS-5 


Sample collection time 


1954 


NAS 


MIGS-4.1 


Latitude 


Not report 


NAS 


MIGS-4.2 


Longitude 


Not report 


NAS 


MIGS-4. 3 


Depth 


Not report 


NAS 


MIGS-4.4 


Altitude 


Not report 


NAS 



Evidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists 
in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sam- 
ple, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes 
are from the Gene Ontology project [33]. If the evidence code is IDA, then the property should have been di- 
rectly observed, for the purpose of this specific publication, for a live isolate by one of the authors, or an expert 
or reputable institution mentioned in the acknowledgements. 




Figure 2. Cell moiphology and flagella of C. 
thailandicus NBRC 3257. (A) Differential inter- 
ference contrast image of C. thailandicus NBRC 
3257 grown on mannitol medium. Bar, 5 pm. 
(B and C) Microscopic images of flagella stained 
by the modified Ryu method. Singly (B) and 
multiply (C) flagellated cells were observed. 
Bars, 5 pm. 
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Genome sequencing information 

Genome pro ject h istory 

This genome was selected for sequencing on the 
basis of its phylogenetic position and 16S rRNA 
similarity to other members of the Gluconobacter 
genus. This Whole Genome Shotgun project has 
been deposited at DDBJ/EMBL/GenBank under the 
accession BASMOOOOOOOO. The version described 
in this paper is the first version, BASMOIOOOOOO, 
and the sequence consists of 107 contigs. Table 2 
presents the project information and its association 
with MIGS version 2.0 compliance [34]. 

Growth conditions and DNA isolation 

The culture of strain NBRC 3257 used to prepare 
genomic DNA for sequencing was a laboratory 
stock and grown onAP medium [ 35] at 30°C with 
vigorous shaking. The genomic DNA was isolated 
as described in [36] with some modifications [35]. 
Three ml of culture broth was used to isolate DNA, 
and the final DNA preparation was dissolved in 10 
mM Tris-HCl (pH 8.0) and 1 mM ethylendiamine 
tetraacetic acid solution. The purity, quality, and 
size of the genomic DNA preparation were ana- 
lyzed by Hokkaido System Science Co., Ltd. (Japan) 
using spectrophotometer, agarose gel electropho- 
resis, and Qubit [Invitrogen, Carlsbad, CA) accord- 
ing to the their guidelines. 

Genome sequencing and assembly 

The genome of G. thailandicus NBRC 3257 was se- 
quenced using the lUumina Hiseq 2000 sequencing 



platform by the paired-end strategy (2x100 bp). 
Paired-end genome fragments were annealed to 
the flow-cell surface in a cluster station (lUumina). 
A total of 100 cycles of sequencing-by-synthesis 
were performed and high-quality sequences were 
retained for further analysis. The final coverage 
reached 358-fold for an estimated genome size of 
3.44 Mb. The sequence data from lUumina HiSeq 
2000 were assembled with Velvet ver. 1.2.07 [37]. 
The final assembly yielded 107 contigs generating a 
genome size of 3.44 Mb. The contigs were ordered 
against the complete genome of G. oxydans 621H 
[3] using Mauve [38-40]. 

Genome annotation 

Protein-coding genes (ORFs) of draft genome as- 
semblies were predicted using Glimmer version 
3.02 with a self-training dataset [41,42]. tRNAs 
and rRNAs were predicted using ARAGORN and 
RNAmmer, respectively [43,44]. Functional as- 
signments of the predicted ORFs were based on a 
BLASTP homology search against two genome se- 
quences, G. thailandicus NBRC 3255 and G. 
oxydans 621H, and the NCBI nonredundant (NR) 
database [45]. Functional assignment was also 
performed with a BLASTP homology search 
against Clusters of Orthologous Groups (COG) da- 
tabases [46]. 



Table 2. Project 


information 




MIGS ID 


Property 


Term 


MIGS-31 


Finishing quality 


Draft 


MIGS-28 


Libraries used 


lllumina Paired-End library 


MIGS-29 


Sequencing platforms 


lllumina Hiseq 2000 


MIGS-31. 2 


Fold coverage 


358 X 


MIGS-30 


Assemblers 


Velvet ver. 1.2.07 


MIGS-32 


Gene calling method 


Glimmer ver. 3.02 




DDBJ ID 


BASMOOOOOOOO 




DDBJ Date of Release 


August 08, 2013 




Proiect relevance 


Industrial 
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Genome properties 

The genome of G. thaila^'^'ni^ NBRC 3257 is 
3,446,046 bp long (107 contigs) with a 56.17% G + 
C content (Table 3). Of the 3,414 predicted genes, 
3,360 were protein coding genes, and 54 were 
RNAs (3 rRNA genes, and 51 tRNA genes). A total of 
2,249 genes (66.93%) were assigned a putative 
function. The remaining genes were annotated as 
hypothetical genes. The properties and statistics of 
the genome are summarized in Table 3. The distri- 
bution of genes into COG functional categories is 
presented in Table 4. Of the 3,360 proteins, 2,669 
(79%) were assigned to COG functional categories. 
Of these, 245 proteins were assigned to multiple 
COG categories. The most abundant COG category 
was "General function prediction only" (342 pro- 
teins) followed by "Amino acid transport and me- 
tabolism" (247 proteins), "Function unknown" 
(232 proteins), "Cell wall/membrane/envelope 
biogenesis" (220 proteins), "Inorganic ion 
transport and metabolism" (210 proteins), and 
"Replication, recombination and repair" (201 pro- 
teins). The genome map of G. thailandicus NBRC 
3257 is illustrated in Figure 3, which demonstrates 
that the pattern of GC skew shifts from negative to 
positive along an ordered set of contigs with some 
exceptions. This suggests that the draft genome 
sequences were ordered almost exactly. 

Gene repertoire of C. thailandicus NBRC 
3257 genome 

Annotation of the genome indicated that NBRC 
3257 has membrane-bound PQQ-dependent alco- 



hol dehydrogenase, adhAB operon (locus_tag 
NBRC3257_1377 and NBRC3257_1378) and adh 
subunit III (NBRC3257_1024). A unique orphan 
gene of adh subunit I was also identified 
(NBRC3257_3117). The gene repertories of other 
membrane-bound PQQ dependent proteins were 
investigated. Homologous proteins of membrane- 
bound PQQ-dependent dehydrogenase 
(NBRC3257_0292), membrane-bound glucose de- 
hydrogenase (PQQ) (NBRC3257_0371), PQQ- 
dependent dehydrogenase 4 (NBRC3257_0662), 
and PQQ-dependent dehydrogenase 3 
(NBRC3257_1743), were identified. In addition, 
two paralogous copies of the PQQ-glycerol dehy- 
drogenase sIdAB operon (NBRC3257_0924 to 
NBRC3257_0925 and NBRC3257_1134 to 
NBRC3257_1135) were identified. 

It has been thought that the respiratory chains of 

Gliiconohacter species play key roles in respirato- 
ry energy metabolism [48-51]. Therefore, the gene 
repertoires of respiratory chains of NBRC 3257 
were also investigated. Besides two type 11 NADH 
dehydrogenase homologs (NBRC3257_1995 and 
NBRC3257_2785) [51], a proton-pumping 
NADH:ubiquinone oxidoreductase operon (type I 
NADH dehydrogenase complex) (NBRC3257_2617 
to NBRC3257_2629), a cytochrome o ubiquinol 
oxidase cyoBACT) operon (NBRC3257_2304 to 
NBRC3257_2307), and a cyanide- insensitive ter- 
minal oxidase cio AB operon (NBRC3257_0388 to 
NBRC3257_0389) [48,49], were identified. 



Table 3. Nucleotide content and gene count levels of the genome 



Attribute 


Value 


% of totah 


Genome size (bp) 


3,446,046 




DNA coding region (bp) 


3,118,161 


90.48 


DNA G+C content (bp) 


1,935,814 


56.17 


Total genes'' 


3,414 


100 


RNA genes 


54 


1.58 


Protein-coding genes 


3,360 


98.42 


Genes assigned to COGs 


2,669 


78.17 



a) The total is based on either the size of the genome in base pairs or the total 
number of protein coding genes in the annotated genome. 
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Figure 3. Graphical circular map of a simulated draft Cluconobacter thallandicus NBRC 3257 genome. The simu- 
lated genome is a set of contigs ordered against the complete genome of C. oxydans 62 IH [3] using Mauve [38- 
40]. The circular map was generated using CGview [47]. From the outside to the center: genes on forward strand, 
genes on reverse strand, GC content, GC skew. 
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Table 4. Number of genes associated with the 25 general COG functional categories 



Code 


Value 


%aRe' 


Description 


J 


157 


4.67 


Translation, ribosomal structure and biogenesis 


A 


0 


0.00 


RNA processing and modification 


K 


190 


5.65 


Transcription 


L 


201 


5.98 


Replication, recombination and repair 


B 


0 


0.00 


Chromatin structure and dynamics 


D 


28 


0.83 


Cell cycle control, cell division, chromosome partitioning 


Y 


0 


0.00 


Nuclear structure 


V 


44 


1.31 


Defense mechanisms 


T 


92 


2.74 


Signal transduction mechanisms 


M 


220 


6.55 


Cell wall/membrane/envelope biogenesis 


N 


43 


1.28 


Cell motility 


Z 


0 


0.00 


Cytoskeleton 


w 


2 


0.06 


Extracellular structures 


u 


95 


2.83 


Intracellular trafficking, secretion, and vesicular transport 


o 


120 


3.57 


Posttranslational modification, protein turnover, chaperones 


c 


170 


5.06 


Energy production and conversion 


G 


194 


5.77 


Carbohydrate transport and metabolism 


E 


247 


7.35 


Amino acid transport and metabolism 


F 


89 


2.65 


Nucleotide transport and metabolism 


H 


129 


3.84 


Coenzyme transport and metabolism 


1 


91 


2.71 


Lipid transport and metabolism 


P 


210 


6.25 


Inorganic ion transport and metabolism 


Q 


64 


1.90 


Secondary metabolites biosynthesis, transport and catabolism 


R 


342 


10.18 


General function prediction only 


S 


232 


6.90 


Function unknown 




691 


20.57 


No COG assignment 




245 


7.29 


Multiple COG assignment 



a) The total is based on the total number of protein coding genes in the annotated genome. 
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